0) Load data

In this ChIP-Seq analysis the goal was analyze two ChIP-Seq data sets and compare the results.

The two data sets chosen for this ChIP-Seq analysis consist of PRDM1 and and ZNF414.

Gene PRDM1 encodes protein PR domain zinc finger protein 1/B lymphocyte-induced maturation protein-1 (BLIMP-1). This protein is expressed in both B and T cells and plays a significant role in B cell development and antibody production.

Gene ZNF414 encodes protein Zinc Finger Protein 414. Its function is not clear but it may be involved in transcriptional regulation.

These two data sets were chosen because an analysis of significant overlaps could enable more inside on how these two proteins may form a complex or have interaction in regulation chromosome remodelling or gene expression.

The two data sets analyzed were downloaded from ENCODE:PRDM1 (data setpeaks), ZNF414 (data setpeaks). 

Both peaks for PRDM1 and ZNF414 were downloaded in bed narrowPeak format and read-in using R package ChIPseeker function readPeakFile for further analysis.

Both peak files contained IDR thresholded peaks and used genome assembly GRCh38.

1) Venn diagram comparing the overlap of binding sites between PRDM1 and ZNF414

Both PRDM1 and ZNF414 showed a large number of peaks. The overlap between the two showed 2106 peaks.

2) Metaplot of PRDM1 and ZNF414 around the transcription start sites (TSS)

2.1) Heatmap of ChIP binding to TSS regions

First, for calculating the profile of ChIP peaks binding to TSS regions, we prepared the TSS regions, which are defined as the flanking sequence of the TSS sites. Therefore, we aligned the peaks that mapped to these regions.

PRDM1///ZNF414 stands for the overlaps between PRDM1 and ZNF414.

2.2) Average Profile of ChIP peaks binding to TSS region

3) Annotate the peaks for genomic features such as intron, exon, 3’UTR, etc and compare the annotations between between PRDM1 and ZNF414

First, we summarized the distribution of peaks PRMD1 and ZNF414 over different type of features such as exon, intron, enhancer, proximal promoter, 5’ UTR and 3’ UTR.

We also checked the genomic element distribution for the overlaps.

We created upsetplots for PRMD1, ZNF414 and the overlaps.

We calculated the percentage of binding sites upstream and downstream from the TSS of the nearest genes and visualized the distribution.

4) Assign peaks to genes – then perform pathway enrichment.

For enrichment of the annotated peaks R package clusterProfiler function enricher was used. For the enrichment we used ontology gene sets of species Homo sapiens from R package msigdbr. Finally, the found pathways were filtered to contain only terms with an adjusted p-value smaller than or equal 0.05.

4.1) What are genes shared in the overlap between PRDM1 and ZNF414?

Table 1 shows the identified 1819 genes shared in the overlap between PRDM1 and ZNF414.

4.2) What are pathways/genesets shared in the overlap between PRDM1 and ZNF414 for ontology gene sets from R package msigdbr?

Table 2 shows the 155 pathways/genesets shared in the overlap between PRDM1 and ZNF414.

4.3) What pathways differ for ontology gene sets from R package msigdbr?

Table 3 shows the 37 pathways that differ between the overlap and PRDM1, i.e. pathways that are in the overlap but not in PRDM1.

Table 4 shows the 22 pathways that differ between the overlap and ZNF414, i.e. pathways that are in the overlap but not in ZNF414.

5) Enrichr

Additionally to the ontology gene sets from R package msigdbr, we performed another gene enrichment analysis. Gene ontology is a very limited pathway annotation that is hard to interpret for many applications. Therefore, we also performed a broad-spectrum enrichr pathway analysis.

The enrichr results for the 1819 overlap genes can be found here: https://maayanlab.cloud/Enrichr/enrich?dataset=a64a28f137dea51300456970b16d9296

Subsequently, we had a closer look at our results for enrichr. Most found pathways belonged to databases related to transcription factors. Therefore, we have a closer look at enrichr database Enrichr_Submissions_TF-Gene_Coocurrence providing interesting results.

The found pathways for this database were filtered to contain only terms with an adjusted p-value smaller than or equal 0.05 and a combined score greater 100.

The formula for the combined score (c) is ln(p) * z. Variable p is the p-value computed using Fisher’s exact test and variable z is the z-score computed to assess the deviation from the expected rank.

5.1) What are pathways/genesets shared in the overlap between PRDM1 and ZNF414 for database Enrichr_Submissions_TF-Gene_Coocurrence from enrichr?

Table 5 shows the 372 pathways/genesets shared in the overlap between PRDM1 and ZNF414.

5.2) What pathways differ for database Enrichr_Submissions_TF-Gene_Coocurrence from enrichr?

Table 6 shows the 95 pathways that differ between the overlap and PRDM1, i.e. pathways that are in the overlap but not in PRDM1.

Table 7 shows the 143 pathways that differ between the overlap and ZNF414, i.e. pathways that are in the overlap but not in ZNF414.

6.1) What is your interpretation of these results?

The goal of this ChIP-Seq analysis was analyze ChIP-Seq data sets PRDM1 and ZNF414.

As said before, gene PRDM1 encodes protein PR domain zinc finger protein 1/B lymphocyte-induced maturation protein-1 (BLIMP-1), which plays a significant role in B cell development and antibody production and gene ZNF414 encodes protein Zinc Finger Protein 414, which function is not clear but may be involved in transcriptional regulation.

These two data sets were chosen because an analysis of significant overlaps could enable more inside on how these two proteins may form a complex or have interaction in regulation chromosome remodelling or gene expression.

As show in the venn diagram, we found 2106 peaks in the overlap between PRDM1 and ZNF414.

The genomic element distribution for the overlaps showed that on the exon level, 53.3% belonged to 5’ UTR, on the exon/intron/intergenic level 68.1% belonged to exon, on the gene level 53.9% belonged to promotor and on the promotor level 46.6% belonged to TSS - 500b.

Overall, our results show that the identified overlaps between PRDM1 and ZNF414 belonged to coding regions and more specifically, to TSS, which is an interesting result considering the unknown interactions in regulation chromosome remodelling or gene expression between the two.

We then have a look at the top 5 pathway after our gene set enrichment analysis for the overlaps between PRDM1 and ZNF414.

We start with the identified pathways of ontology gene sets from R package msigdbr:

We can see that two of them belong to RNA processes:

One belongs to the nuclear cellular component nuclear envelope:

And two belong to ncRNA processes:

This could be an indicator that the identified overlaps between PRDM1 and ZNF414 consist of regions with genes involved in transcription and seems to confirm the assumption that ZNF414 may be involved in transcriptional regulation.

This assumption could also further be validated by our pathways that differ between the overlaps of PRDM1 and ZNF414.

The different pathways between the overlaps and PRDM1 seem to mainly relate to chromatin / chromosome / transcriptional processes, e.g. covalent chromatin modification, regulation of chromosome organization and ribosome.

The different pathways between the overlaps and ZNF414 showed partially similar results, like e.g. regulation of posttranscriptional gene silencing but also unrelated terms like e.g. abnormal uterus morphology.

Overall, most of the identified pathways in the overlaps between PRDM1 and ZNF414 seem to be related to transcriptional regulation.

As stated before, most identified pathways for the overlaps between PRDM1 and ZNF414 with broad-spectrum enrichr pathway analysis belonged to databases related to transcription factors, like enrichr databases Enrichr_Submissions_TF-Gene_Coocurrence, ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X and ENCODE TF ChIP-seq 2015.

Enrichr_Submissions_TF-Gene_Coocurrence contains 1722 terms, a gene coverage of 12486 genes and 299 genes per term.

TF stands for transcription factor. In molecular biology, a TF is a protein that is important for the initiation of RNA polymerase during transcription. Co-occurence networks are used to describe potential relationships between entities like in this case TFs.

Samples from the top 10 Enrichr_Submissions_TF-Gene_Coocurrence pathways are:

Overall, our results strongly indicate that the identified overlaps between PRDM1 and ZNF414 are related to transcriptional regulation.

6.2) What future directions could you propose to follow up on these findings?

One direction for future research would be how these overlaps between PRDM1 and ZNF414 are involved in transcriptional regulation play a role in B cell development and antibody production. ZNF414 may be more important for the regulation of PRDM1 than previously thought.